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Abstract 

We present theoretical properties of the log-concave maximum likelihood 
estimator of a density based on an independent and identically distributed 
sample in W^. Our study covers both the case where the true underlying density 
is log-concave, and where this model is misspecified. We begin by showing that 
for a sequence of log-concave densities, convergence in distribution implies much 
stronger types of convergence - in particular, it implies convergence in Hellinger 
distance and even in certain exponentially weighted total variation norms. In 
our main result, we prove the existence and uniqueness of a log-concave density 
that minimises the Kullback-Leibler divergence from the true density over the 
class all log-concave densities, and also show that the log-concave maximum 
likelihood estimator converges almost surely in these exponentially weighted 
total variation norms to this minimiser. In the case of a correctly specified 
model, this demonstrates a strong type of consistency for the estimator; in a 
misspecified model, it shows that the estimator converges to the log-concave 
density that is closest in the Kullback-Leibler sense to the true density. 

*Address for correspondence: Dr Richard Samworth, Statistical Laboratory, Cen- 
tre for Mathematical Sciences, Wilberforce Road, Cambridge, CBS OWB, UK. Email: 
r.samworth@statslab.cam.ac.uk 
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1 Introduction 



Althoug h work on shape-c onstrained density estimation dates back to the celebrated 
paper of iGrenanden (119561 ) on the estimation of a decreasing density on the positive 
half-line, it is in recent years that the area has enjoyed its most significant interest. 
This is partly because algorithmic and technological advances now allow the compu- 
tation of estimators that would not previously have been feasible, and partly because 
statisticians now have more tools at their disposal for the study of the theoretical 
properties of these estimators. 

The attraction of the use of these estimators is that, in contrast to alternative non- 
parametric density estimation methods such as those based on kernels or wavelets, 
they provide fully automatic procedures, with no smoothing parameters to choose. 
Such procedures are particularly desirable when the data are multidimensional, and 
the choice of (often multiple) smoothing parameters is particularly problematic. 

The properties of the Grenander estirnator a r e now quite well understood, thank s 



Birgel fll989h . Ivan de Geeii fll993h and 



to the works of [Marshall and Proschan (I19651). iPrakasa Rad (Il969f) , iDevroyd (119871 ) 



Balabdaoui et al.l (120091 ). Other examples of 



shape constrain ts on univariate densities that have been studied in the literature in- 

elude convexity (IGroeneboom. Jongbloed and Wellnerl.l200ll : iDiimbgen. Rufibach and Wellneii . 
20071 ) and /c-monotonicity (IBalabdaoui and Wellneii . |2008| ) . It is also known that a 
maximum likelihood estimator does not exist over the class of unimodal densities - 



cf. iBirgel (11997t ). 



cently - see, for example. 


Walthei] ( 


2002 




Diimbgen, Hiisler anc 


Rufibach 


( 


2007h. 


Pal. Woodroofe and Mever 


2007 




Diimbgen and Rufibach 


(2009h. 


Balabdaoui. Rufibach and Wellner 


(2009). The class of univariate lo 




3oncave densities includes many common paramet- 



ric families, such as the normal, r(a,/5) (a > 1), Be tafa,/?) (a,0 > 1), Wei b ulUa) 
(a > 1), Gumbel, logistic and Laplace densities; see iBagnoli and BergstromI (119891 ) 
for other examples. Among the desirable pro perties enjoyed by the class are the 



facts that it is closed under convolution (e.g. iDharmadhikari and Joag-DevI (11988 
Theorem 2.18)) and the taking of pointwise limits. 



The existence and uniqueness of a log-concave maximum li kelihood estimator f„, of a 
density /o based on a sample Xi, . . . , X„ in R'^ was proved in lCule. Samworth and Stewart 



( l2007l ). There, the structure of was outlined and an algorithm for its computa- 
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tion was described. The algorithm was imp lemented in the R package LogConcDEAD 
( ICule. Gramacy and Samworthl . 120071 . l2009l ) . 



In this paper, we study the statistical properties of the estimator. Importantly, our 
results do not assume that the underlying density is log-concave. To the best of our 
knowledge, such results have not been obtained before even for univariate data, but 
are of interest because in practice it is impossible to tell from a sample of data whether 
the assumption of log-concavity is satisfied. It is therefore natural to seek assurance 
that the estimator will behave sensibly if the condition is violated. In our main 
result (cf. Theorem m below) , we prove under very mild conditions the existence and 
uniqueness of a log-concave density /* that minimises the Kullback-Leibler divergence 
from /o and show that there is an interval of values of a for which 

/ e'^il^ll|/„(x)-r(a;)|'^-0 

as ^ oo. Moreover, if /* is continuous, then sup^jgigd e'^"^" |/„(a;) — /*(a;)| as 
n ^ oo. The upper bound for the range of allowable values of a is explicitly linked 
to the rate of tail decay of /*. In the case where /o is log-concave, it is well-known 
that f* = fo, so the result demonstrates the strong consistency of /„ in exponentially 
weighted total variation norms, and in exponentially weighted supremum norms if /o 
is continuous. If the true density is not log-concave, we see that the limiting density is 
the one that is closest (in the Kullback-Leibler sense) to fo. As described in Section [3] 
below, this result strengthens what was previously known even for the case d = 1. 

The rest of this paper is organised as follows. In Section [21 we study sequences of log- 
concave densities that converge in distribution to a limiting density, and demonstrate 
that the convergence must also occur in much stronger senses. In Section [3l we show 
that, with probability one, the estimator is uniformly bounded above on M"^, and 
uniformly bounded below on compact subsets in the interior of the support of the 
true density. This enables us to state and prove the main result described in the 
previous paragraph. Further auxiliary results can be found in the Appendix. 



2 Convergence of log-concave densities 

We begin with an elementary lemma, whose proof is given in the Appendix. Let / 
be a log-concave density on M*^. 
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Lemma 1. There exist a > and 6 G M such that f{x) < e "11^11+* for all x G M'^. 

Notice in particular that if X has density /, then Lemma [1] imphes that the moment 
generating function of X is finite in an open neighbourhood of the origin. 

If /, /i, /2, . . . are densities on M.'^, we write fn—^f for the convergence in distribution 
of the corresponding measures; in other words, fn f means / g{x)fn{x)dx 
J 9{^)f{^) dx for all bounded, continuous functions (7 : R'^ ^ R. The following result 
shows that when it is known that a sequence of densities is log-concave, convergence 
in distribution in fact implies much stronger forms of convergence. A similar result, 
proved i ndependently at around the same ti r ne an d using different techniques, can be 



found in [Schumacher. Hiisler and Diimbgenl (120091 ). We write ii for Lebesgue measure 



on 



Proposition 2. Let (fn) be a sequence of log-concave densities on R'^ with fn—^f 
for some density f . Then: 



(a) f is log- concave 

(b) fn^ f , ^-almost everywhere 

(c) Let ao > and be such that f{x) < e"'^''"^!!"'"'''' . Then for every a < uq, we 
have /[gd e''ll'^ll|/„(x) — dx ^ and, iff is continuous, sup^.g]gd e""^" |/„(x) — 
/(x)I^O. 



P roof, (a) This part of the prop osition can be deduced from Theorems 2.8 and 2.10 
of Dharmadhikari and Joag-Dev (jT988). Their proof relies on a non-trivial correspon- 
dence between log-concave probability measures and log-c oncave densities, which in 
turn d epends on several other facts about log-concavity - cf. iDharmadhikari and Joag-Dev 
(Il988l . pp. 46-54). We give an alternative proof because it is perhaps a little more di- 
rect, and because it forms part of the proof of part (b) below. 



Let /„ — > /. Our proof relies on two crucial results. The first is that if T) is the class 
of all Borel-measurable, convex subsets of R'^, then 



sup 



ifn - f) 



D 
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as n ^ oo (IBhattacharya and Rad . 1 19761 . Theorem 1.11, p. 22). The second is 
Lebesgue's differentiation theorem: recall that a family {As : 5 > 0} of Borel subsets 
of R'^ shrinks nicely to xo G M'^ with eccentricity bound r/ > if C Bs, where Bs 
is the closed (Euclidean) ball of radius 6 centered at xq, and if the family is such that 
IJ,{As) > rjii{Bs) for all 6 > 0. Then for /x-almost all xq, we have 



1 



\f{x) - f{xo)\dx 



(2.1) 



as 5 —s> 0, for every family {As : 5 > 0} that shrinks nicely to Xq I FoUandl . 1999 



Theorem 3.21). Points xq at which this equality holds are called Lebesgue points of 
/; notice that any continuity point of / is certainly a Lebesgue point of /. In fact, 
it will be helpful to note the following small generalisation: if we have a sequence 
{Ak^s : A; e N, (5 > 0} of families that shrink nicely to Xq with the same eccentricity 
bound rj, then the convergence in (12.11) is uniform in k. 

Consider i^i = {s G : liminf/„(x) < f{x)}, and suppose for a contradiction 
that /i(-Ei) > 0. Then there exists a Lebesgue point xq of / satisfying xq G Ei 
and /(xo) < oo. Letting e = /(xq) — lim inf /^(xo), find a subsequence (/n^) with 
/nfc(xo) < /(xo) — 3e/4, and for 6 > 0, define the convex sets 

Dk,s = {xeBs: /„,(x) > /(xo) - e/2}. 

Observe by the concavity of log/„j. that if x G Dk^s then 2xo — x G Bs\Dk,s- It follows 
that fi{Bs\Dk^s) > /^(-S<5)/2. This means that we can apply Lebesgue's differentiation 
theorem to choose 6 > small enough that for every k, 

1 



KBs \ D 



k,5) 



{f{xo) - f{x)}dx 



B5\Dk,, 



e 

< -. 



But then 



(/ frik) 



{/(x)-/(xo) + /(xo)-/„,(x)}dx+ / (/-/„J 



- -1/^(^-5) + 7^(^5) - sup 



(/ fuk) 



D 



We conclude that liminf^ f^^if — fuk) ^ ^l^{Bs)/A, a contradiction. Hence /i(-Ei) = 0. 

Thus, without loss of generality, we may assume / < liminf„/„. But by Fatou's 
lemma, 

/ < / lim inf < lim inf / /„ = 1, 
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so in fact we may assume / = lim inf /„. Since lim inf /„ is log-concave, this proves (a) . 

(b) Now suppose that n^E^) > 0, where E2 = {x eM.'^ : limsup/^(a;) > f{x)}. Since 
/ is log-concave and so continuous almost everywhere, let xq G E2 be continuity point 
of /. Define eo G (0, 00] by Cq = lim sup /^(xo) — f{xo) and set e = min(l,eo). We 
can find a subsequence (/nj.) with fuki^o) > f{xo) +3e/4 for all k. Define the convex 
sets 

Dk,5 = {xeBs: f^^{x) > f{xo) + e/2}. 
There are three cases to consider: 

Case (i): Suppose the sequence {Dk,5 : A; G N, 5 > 0} of families shrink nicely to Xq 
with the same eccentricity bound 77. Find 5 > such that \ f{x) — f{xQ)\ < e/4 for all 
X & Bs- Then, for every k, 




4-4 



contradicting our first crucial result. 

Case (a): Suppose we are not in Case (i), and that /(xq) > 0, so that by reducing 
e if necessary we may assume /(xq) > e/2. In this case, since for each k the ratio 
fi{Dk,s)/ fJ'{Bs) is decreasing as 5 increases, there exist S > and positive integers 
k{l) < k{2) < ... such that 

l^{Bs) - 2 ' 

where 

^ log(/(.r„) + 3r/l) -l()g(/(.To) + r/2) 
log(/(xo) + 3e/4) - log(/(xo) - e/2) " 

It is straightforward to show, using the concavity of log fn^ , that ^{Dk^s) < ijL{Dk,s) / f^, 
where as above, 

Dk,& = {xeBs: fnA^) > f{xo) - e/2}. 

We may also assume that \f{x) — f{xo)\ < e/4 for all x G -B5. We conclude that for 
alU, 




< -^KBs \ Dk(i),s) + sup / (/„ - /), 
4 DevJo 
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so limsup^ /^//"^(o ~ /) — ~if^iBs), a contradiction. 

Case (Hi): Finally, if /(a;o) = 0, then without loss of generality we can find 6 > such 
that f{x) = for all x G B^. For each t > 0, we have fi{{x G Bs : fnki^) > 0) ^ 
A; —>■ oo. Choose t > small enough that fuki^o) > ^ and such that there exist points 
xi, . . . , Xrf in the interior of the effective domain of log /, denoted int(dom log /), with 
f{xj) > 2t and /i(conv{xo, Xi, . . . ,Xd}) > 0, where conv{xo,xi, . . . ,Xd} denotes the 
convex hull of {xq, Xi, . . . , Xd}- As we cannot have conv{xo, Xi, . . . , Xd} contained in 
{x G M'^ : /nj. (x) > t} eventually, there exists j G {1, . . . ,d} and a further subsequence 
(fnk(i)) ^^^^ that fn^,^i^{xj) < t. We then obtain a contradiction as in the proof of 
Proposition [2](a). Hence /i(-E'2) = 0, as required. This proves (b). 

(c) Write 0„ = log/„ and = log/. Without loss of generality assume G 
int(dom 0), and let > be small enough that -Bt,(0), the closed ball of radius 
7] > about the origin, is contained in int(dom 0). 

Fix a G (0,ao)) ^.nd let 5 = — a. By Lemma [H we can find i? > such that 
Pl|{0(x) — 0(0)} < — (a + ^) for all ||x|| > R/2. We claim there exists uq such that 



||x|| V 4 

for all ||x|| > R and n > uq. Note that since, for each n, the ratio on the left-hand 
side of (12. 2p is a decreasing function of ||x||, it suffices to prove that the inequality 
in (12.21) holds for all ||x|| = R and n > uq. This is straightforward to see if the ball of 
radius R about the origin is in int(dom 0), because in that case 0„ — uniformly on 



this ball (iRockafellarl . 119971 . Theorem 10.8). In general, however, we argue as follows. 
Suppose we can find a subsequence (0nfe) and a sequence (x^) with ||xa;|| = R such 
that 

0nfc(a^fc) -0n,(O) ^ f 6-^ 

M r + 4 

for all k. Let Ck = Ak n Bk, where Ak = {Xxk + (1 - A)?/ : A G [0, l],y G 5^(0)} 
and Bk = {y E M.'^ : \\y — Xk\\ < -R/2}, so that Ck is convex and /i(Cfc) = C > 0, 
independent of k. By reducing rj > ii necessary, we may assume Jfl^^j ^ 1 + « r~^ — 



and 1] < R/A. Finally, since (0nfc) is equi-Lipschitzian on Bfj{0) (IRockafellarl . 119971 . 
Theorem 10.8), we may assume rj is small enough that (pnAy) — 4>nk{0) ~ ff ^oi all 
y G Br,{0). Since any x* = Ax^ + (1 - X)y G Ck has XR-f]< ||x*|| < R and A > 1/3, 
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we have 



0n,(x*) - 0n,(O) ^ Mn.M + (1 - A)0n,(l/) " 0n,(O) 



> 



\x*\\ 8 
' X* 8 - V 2 ' 



From this we deduce that 



hminf [ (/„, - /) > ^{e-|("+'5/2)i?+<A{0) _ g-i(a+3V4)i?+</.{0)|^ 

Jck 

contradicting the first crucial result in the proof of Proposition [2]^a). This establishes 
our claim at fl2.2p . But this means there exists b E M. such that sup„>„|^ < 
g-{a+(5/4)||x||+fe Pi-om Proposition [2](b) and the dominated convergence theorem we 
conclude that 

e*ll|/„(x)-/(x)|rfx^O. 



Now suppose that / is continuous and let e G (0, 1). Choose R > large enough that 
f{x) + sup„>„jj fn{x) < ee~"ll^ll/2 for all > R. Then certainly, 

sup e*ll|/„(a;)-/(a;)| <e 

\\x\\>R 

for n > riQ. Using the continuity of /, we can choose a closed, convex set 5" C 
int(dom 0) fl -B_r(0) such that f{x) < e~"'^/2 for all x G S"^. Since /« — > / uniformly 
on S, we may assume sup^g5 — /(a;)| < ee~"^/2 for all n > uq. Finally, 

for X G Br{0) \ S, we may assume e > is small enough that /„(0) > ee~"^ for 
n > hq. Since fn{x) < ee'""^ for x on the boundary of S and n > no, it follows that 
< ee~"^ for x G Br{0) \ S and n > no- We deduce that 

sup e'^M|/„(x)-/(x)| <e 
for all n > no- □ 
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3 Theoretical properties of the log-concave maxi- 
mum likelihood estimator 



Let Xi,X2, . . . be an independent and identically distributed sequence with density 
/o (not necessarily log-concave), and for n > + 1 let /„ denote the log-concave 
maximum likelihood estimator of /o based on Xi, . . . Let E denote the support 
of /o; that is, the smallest closed set with fo = 1. The lemma below estab- 
lishes appropria te upper and lower bounds for /„. Part (a) of the lemma strengthens 



Theorem 3.2 of iPal. Woodroofe and Meyerl (120071 ). where for the case of univariate 



data and a log-concave underlying density it was shown that the random variable 
sup„gpj sup^gjgd /„(x) is finite with probability one. To the best of our knowledge, 
lower bounds such as that appearing in part (b) have not been studied before. 

Lemma 3. Suppose that f^^ \\x\\fo{x)dx<oo. 

(a) There exists a constant C > such that, with probability one, 

limsup sup fn{x) < C. 



(b) Let S be a compact subset ofint{E). There exists a constant c > such that, 
with probability one, 

liminf inf fn{x) > c. 

n— >oo aiGconv S 

Proof, (a) Let g{x) = exp(— ||x|| + b), where the normalisation constant b is chosen 
to ensure 5^ is a density, so that 



fo^ogg = -j \\x\\fo{x)dx + b = k + l, 

say. Now let C = e^, where M is large enough that M > k + 1 and such that 
/o ^1/4 whenever fi{D) < 2'^+^(M — kYe~^ . Let / be any log-concave density 
with sup^.g]gd /(x) = C . We claim that, for sufficiently large n, the log-concave density 
g has higher log-likelihood. More precisely, if 'i.o.' stands for 'infinitely often', we 
claim that 

/I " 1 " \ 

P -5^1og/(X,) > -5^1ogs(X,) i.o. = 0. 

^ 1=1 i=l ^ 
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The result follows immediately from this claim. To establish the claim, write = 
log/, and observe that 

/l " 1 " \ 

P -5^0(X,) > -J2^ogg{X,) i.o. (3.1^ 

^ i=l i=l ^ 

^ ^ 1=1 / i=i / 



< P 



The first term on the right-hand side of (13. ip is zero, by the strong law of large 
numbers. 

To prove the second term on the right-hand side of fl3.1l) is zero, the idea is to show 
that there is only a small set on which is large, so with high probability only a 
small proportion of the observations are in this set. To this end, let = {x G M"^ : 
> t}. By concavity of for t G \lk — M, M], we have 



/ M — t \ CI 
It follows by Fubini's theorem that 



M 



/l{log/>2fc-Af} = y J ^{t<f{x)} dtlliogf{x)>2k-M} dx 

Thus P(Xi G D2k-M) = jo^k-M •^o - deduce that 



by Hoeffding's inequality. The first Borel-Cantelli lemma then completes the proof 
of (a). 

(b) By the concavity of log/„, it suffices to prove the conclusion of this part of 
the lemma when the infimum over x G conv 5* is replaced with an infimum over 
X E S. Let be a compact subset of int(i?) and let 5 > be small enough that 
5**^ = {s G M'^ : dist(x, S*) < 5} is contained in int(i?). Consider the function 
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Xq I— s> fo, where Bs again denotes the closed ball of radius S centered at Xq. This 

function is positive and continuous on S^^'^, so attains its (positive) infimum on this 
compact set. Thus there exists p > such that fo > p for any Borel subset B of 
W'' containing a ball of radius 5/2 centered at a point in S^^"^. 

Now let / be any log-concave density on M'^, and let c = 2 inf^^g^ /(x). We show 
that if c > is sufficiently small, then / is not the maximum likelihood estimator 
for large n. By Lemma [3](a), we may assume sup^gj^d /(x) < C. Recall that the 
density g was defined by g{x) = e""^""'"^, where 6 is a normalisation constant, and 
that k = — J ||x||/o(a;) dx + b — 1. Suppose c G [0, C] is small enough that | logc + 
(l — I) logC < k. Writing B = {x E : f{x) < c}, we note that B contains a 
point Xo G S, and ii x ^ B, then 2xo — x E B. Thus B contains a ball of radius 6/2 
centered at a point in S^^'^, so fo > p. Thus, if = log/, then 

^ i = l ^ ^ i = l ^ 

again by Hoeffding's inequality. By the first Borel-Cantelli lemma, and arguing as in 
the proof of Lemma [3]^a) above, we conclude that 

. n n X 

P -^log/(X,) > -^log^(X,) i.o. = 0. 

^ i=l i=l ' 

□ 



Our next theorem is the main result in this paper and establishes desirable perfor- 
mance properties of the log-concave maximum likelihood estimator. We first recall 
that the Kullback-Leibler divergence of a density / from /o is given by 



dKLifoJ) 



/olog 



fo 
f 



Jensen's inequality shows that the Kullback-Leibler divergence is non-negative, and 
equal to zero if and only if / = /o (almost everywhere). Thus in the case where fo 
is log-concave. Theorem H] below shows that the log-concave maximum likelihood 
estimator /„ is strongly consistent in certain exponentially weighted total varia- 
tion metrics. Convergence in exponentially weighted supremum norms also follows 
if fo is continuous. The theorem stre ngthens known results eve n in th e univari- 
ate case, which include Corollary 1 of iPal. Woodroofe and Meyer! (120071 ) . where it 
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w as proved that /„ is strongly consistent in Hellinger distance, and Corollary 4.2 



of iDiimbgen and Rufibachl (120091 ). where (weak) consistency of /„ in the unweighted 
total variation distance was established. (The observation t hat the mode of conver- 
gence in the univariate consistency result of Corollary 4.2 of iDiimbgen and Rufibach 



m 



(I20091) could be strengthened was also made independently at around the same time 



Schumacher. Hiisler and Diimbgen ( 20091 ).) 



In the case where the model is misspecified, i.e. /o is not log-concave, we prove 
that the existence and uniqueness of a log-concave density /* that minimises the 
KuUback-Leibler divergence from /q. Moreover, we show that the log-concave maxi- 
mum likelihood estimator /„ converges in the same senses as in the previous paragraph 
to /*. The natural practical interpretation is that provided /o is not too far from 
being log-concave, the estimator is still sensible. 

We write log_,_ x = max(logx, 0) and recall that E denotes the support of /q. 

Theorem 4. Let /o be any density on M°' with J^^ ||^||/o(3;) dx < oo, J^^ /olog_,_ /o < 
oo and int(£') ^ 0. There exists a log-concave density f*, unique almost everywhere, 
that minimises the Kullback-Leibler divergence of f from /o over all log-concave den- 
sities f. Taking aQ > and bo G M. such that f*{x) < e~"° 11^11+''°, we have for any 
a < ao that 

' e'^INI|/„(x)-r(x)Mx'^-0, 



and, if f* is continuous, sup^-g^d e'^H^H |/„(x) — f*{x)\ 0. 



Remark: The conditions on the underlying density /o are very weak indeed. The 
first condition asks for a finite mean, while the second is satisfied by any bounded 
density, as well as a wide class of unbounded densities. The third condition is also very 
weak, but it may help to give an example where it fails: let (g„) be an enumeration 
of the rationals in [0, 1], and let fo oc 1e, where E = [0, 1] \ Uj^;^(g„ — 5'n + ^^j)- 
In this case int{E) = 0. 



Proof. By the two integrability conditions, the log-concave density g{x) = e""^""*"* , 
where 6 is a normalisation constant, satisfies dKL{fo,g) < oo. We can therefore pick a 
minimising sequence of log-concave densities (/„) for the KuUback-Leibler divergence 
from /o; in other words, the sequence (/„) satisfies 

dKLifoJn) inf dKLifoJ), 
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where To denotes the class of all log-concave densities. A slightly simpler version of 
the argument given in the proof of Lemma [3](a) shows that there exists C > such 
that /n < C for all n. Similarly, a small modification to the argument in the proof 
of Lemma [3t^b) shows that for every compact subset S of int(i?), there exists c > 
such that 

liminf inf /„(x) > c. 

n^oo xGconv S 

We claim that the sequence (/„) is tight (or more precisely that the sequence of 
probability measures corresponding to the sequence of densities is tight). To see this, 
let S' be a compact subset in int(i?), and choose c > such that 'm.i^(zs fn{.x) > c 
for large n. Without loss of generality we assume G S" and fi{S) > 0. Now, for 
R sufficiently large, we must have limsup^^^o^ sup||a,||>^ /„(a;) < c/2, as otherwise the 
Lebesgue measure of the convex level sets {x G M'^ : fn{x) > c/2} would be too large 
for each /„ to be a density. It follows that there exist Oq > and bo E M. such that 
sup„gj^/„(x) < e-"-oM\+bo^ g^Yid tightness of the sequence follows. 



Prohorov's theorem (iBillingsleyl . Il999l . Theorem 5.1) therefore guarantees the exis- 
tence of a probability measure u* such that a subsequence (/nj.) converges in distri- 
bution to u*. Now, given e > 0, choose 6 = e/ (2C). If A is a Borel subset of with 
IJ,{A) < 6, then since Lebesgue measure is regular we can find an open set A' ^ A 
such that fi{A') < 26. Now 

u*{A) < u*{A') < liminf / < Cfi{A') < e. 

J A' 

Thus V* is absolutely continuous with respect to /x, and we may write /* for its density 
with respect to /x. By Proposition [2](a), /* is log-concave, and by Proposition [2]^b), 
/rife — ^ /* almost everywhere. Finally, by Fatou's lemma, we have 

dKdk. n = j /o(log/o - log/*) < liminf j /o(log/o - log/„J 

= inf dKiifoJ')- 

Thus /* does indeed minimise the Kullback-Leibler divergence from fo over the class 
of log-concave densities. 

Suppose now that both /j* and /g minimise the Kullback-Leibler divergence from fo 
over the class of log-concave densities. Defining 

p_ (/r/2)^/^ 
^ /(/r/2*)^/^' 
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we see that /* is a log-concave density with 

dKLifo, n = dKLifo, fl) + log J ifU2f" < dKLifo, fD, 

by the Cauchy-Schwarz inequality, with equality if and only if /* = /g, yU-almost 
everywhere. This proves the claimed uniqueness property of /*. 

Now, write Fq for the distribution function corresponding to the density /o and Pq for 
the distribution on M'^ induced by Fq. Similarly, write F„ for the empirical distribution 
function of Xi, . . . , X„, and P„ for the corresponding empirical measure. By definition 
of fn, we have for any b > that 

< / log(6 + fn) d¥n - [ log /* dF„ 
= Ijogib + /„) diK - Fo) + ljog(^^^^ dFo + ljog(^^±f-^ dF, 



+ [ logrrf(Fo-F„). (3.2) 



The idea of adding the small con stant 6 > in this calculation first appeared in 



Pal. Woodroofe and Meyer! (120071 ). We first derive an appropriate uniform law of 
large numbers to handle the first term on the right hand side of (13.21) . By Lemma[3](a), 
we may assume that /„ < C. Recall that T> denotes the class of all Borel-measurable 
convex subsets of W^. For any log-concave density / with f < C, we have by Fubini's 
theorem that 



/ log(6 + /)rf(F„-Fo)= /" log(l + //6)rf(F„-Fo) 

p plog{l+C/b) 

= / i{t<\og{i+f/b)} dtdiWn — Fo) 

jRd Jo 

log(l+C/fe) 







(P„-Po)({x:/(x)>Ke*-l)})rft 



< logfl + -) sup(P„ - Po)iD) "-^ 

V / Dfz-D 



as n — i> oo. Hence 



/ log(6 + /„)rf(F„-Fo)'^ 
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as n oo. 



Combining this result with an apphcation of the strong law of large numbers to the 
fourth term on the right-hand side of fl3.2p . we deduce that with probability one, 

limsup [ log r ) dFo < [ \og( ^^J ] dFo. 

It follows by the monotone convergence theorem that with probability one, 

limsup limsup / log! — 1 dF^ < 0. 

b\0 n^oo jRd \b + fnJ 

Lemma E] in the Appendix allows us to deduce from this that Jj^^ |/n — /*| 0, so 
the full result follows by Proposition [2l □ 



4 Appendix 

Before proving Lemma [T|, we first derive a basic property of a log-concave density /. 
Recall that the epigraph of a concave function : M"' ^ [—oo, oo) is the set 

: x e M'^,/! G M,/i < 

The closure of (p, denoted cl(0), is the concave function whose epigraph is the closure 
in M"'+^ of the epigraph of (p. The functions and cl(0) agree almost everywhere, and 
we say (p is closed if = cl(0). 

Lemma 5. A log- concave density f is bounded above and the version of f that is 
closed attains its maximum. 



Proof. Without loss of generality, we may assume log / is closed. It has no directions 
of increase, because otherwise there would exist e G M such that the set {x G M*^ : 
log/(x) > e} were c?-dimen sional, convex and unbounded (so of infinite Lebesgue 
measure). Theorem 27.2 of iRockafellarl (119971 ) therefore gives that log / attains its 
(finite) maximum. □ 
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We can now prove Lemma [H 



Proof of Lemma [T]. 

Let = log /. Without loss of generality, we may assume that G int(dom 0). Since 
4>{x) — oo as ||x|| oo, we can find R > such that — 0(0) < —1 for all 
llsll > R. But, for each z G M'^ with \\z\\ = 1, we have that -{(f){cz) — 0(0)} is 
non-increasing in c > 0. Thus, with a = 1/R, we have ^{0(cz) — 0(0)} < —a for 
c> R. Now 

0(x) 0(x) - 0(0) 0(0) 0(0) 
IT- n- = n— n ^ ir- r < -a + -n— r 

fY» /Vl <-v» <-v» 

cA/ iX^ iX^ iX^ 

for all > R. Since is bounded (by Lemma [5]), the result follows by choosing 
b > 0(0) sufficiently large. □ 

The following lemma is used in the proof of Theorem HI The conclusion can be 
immediately strengthened using Proposition [21 and is stated in this way only for 
brevity. 

Lemma 6. Let /o be any density on M*^ with f^^ ||3;||/o(a^) dx < oo, f^^ fo log+ /o < oo 
and int(-E) ^ 0. Let f* be a log-concave density that minimises the Kullback-Leibler 
divergence from Jq over the class of log-concave densities. If (fn) is a sequence of 
log-concave densities satisfying 




lim sup lim sup 

then J^a \fn - /*| ^ as n ^ OO. 

Proof Let $ : M'^ ^ M be the Young function $(a;) = (1 + |x|) log(l + \x\) - \x\. The 
Orlicz space is the set of (equivalence classes of) measurable functions / : M'^ M, 
whose Luxemburg norm ||/||<i>, given by 

11/11^ = inf{A > : 1 $(|/|/A)<l}, 
is finite. Let $(?/) = e'^' — |?/| — 1 denote the Young conjugate of and let 



denote the corresponding Luxemburg norm on L . Then by iRao and RenI (11991 



Proposition 1, p. 58), and the remark following it, for f ^ and g G L*, we have 

\f9\<m\A\9U- 
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Noting that ||/o||<i> < oo, an immediate application of this formula yields that for any 
Borel subset D oiR'^, 

/o < (4.1) 
^ - log fi{D) 

Now let / be a log-concave density with sup2.gi[jd f(x) = C = e*^. For large M, we 
have as in the proof of Lemma [3]^a) that fx{{x : f{x) > 1}) < lAf^e"*^ < e-*^/^. It 
follows that for any bo G (0, 1) and b < bo, we have 

/ log(6 + /) dFo < [ log(6o + /) dFo < log(26o) + / log(2/) dFo 

J J J f>bo 



<21og2 + log6o+ / log f dFo 
< 2 log 2 + log Oo + 



log/i({x : f{x) > 1}) 
< 21og2 + log6o + 4||/o||$ ^ -oo 

as bo 0. Here, the penultimate inequality uses (14.11) . We deduce that the sequence 
(/„) in the statement of the lemma satisfies the condition that there exists C > 1 
such that 

sup sup fn{x) < C. 

Now let S* be a compact subset of m.t[E). Find 5 > such that C int(ii^) and, 
as in the proof of Lemma [3]^b), find p > such that fo > P for all Borel subsets 
B of M'^ that contain a ball of radius S/2 centered at a point in S^^"^. Let / be any 
log-concave density on M."^ with sup^gjjd /(x) < C, and write c = 2 inf^-g^ /(x). If 
c G [0, C] is sufficiently small, then we can find bo > small enough that 



plog(6o + c) + (l-p)log(6o + C) < J log r dFo -1. 
Then writing B = {x : f{x) < c}, we have for all b G (0, bo) that 

J log(6 + /) dFo < log(6o + c) dFo + ^ log(6o + C) dFo < J log f* dFo-1. 



We deduce that there exists c > such that 



liminf inf fn{x)>c. (4.2) 
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As in the proof of Theorem HJ we have from fl4.2l) that the sequence (/„) is tight. Thus 
if (/rife) is an arbitrary subsequence of (/„), then there exists a further subsequence 
ifnk{i)) ^ log-concave density / such that / Ifn^^ "~ /I ^ 0. But then, by the 
dominated and monotone convergence theorems, 

f f b + f* \ r fb + f*\ 

lim sup lim sup / log dFo = lim sup / log — — dFo 



b\0 l^oo JKd \b+ fn^^^i^ J b\0 JKd \b+ f 

log ^ dFo > 0, 

with equality if and only if / = /* almost everywhere. By the hypothesis of the 
lemma, we must have / l/n^f,) — f*\ — > 0. Since every subsequence of (/„) has a further 
subsequence converging in total variation norm to /*, we must have J \ fn — f*\ 
0. □ 
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